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Abstract 

Given a capacitated communication network Af and a function / that needs to be computed on A/”, we study the problem 
of generating a computation and communication schedule in Af to maximize the rate of computation of /. Shah et. al.[IEEE 
Journal of Selected Areas in Communication, 2013] studied this problem when the computation schema Q for / is a tree graph. 
We define the notion of a schedule when CJ is a general DAG and show that finding an optimal schedule is equivalent to finding 
the solution of a packing linear program. 

We prove that approximating the maximum rate is MAX SNP-hard by looking at the packing LP. For this packing LP we 
prove that solving the separation oracle of its dual is equivalent to solving the LP. The separation oracle of the dual reduces to the 
problem of finding minimum cost embedding given Af, Q, which we prove to be MAX SNP-hard even when Q has bounded degree 
and bounded edge weights and Af has just three vertices. We present a polynomial time algorithm to compute the maximum rate 
of function computation when Af has two vertices by reducing the problem to a version of submodular function minimization 
problem. 

For the general Af we study restricted class of schedules and its equivalent packing LP. We observe that for this packing LP 
also the separation oracle of its dual reduces to finding minimum cost embedding. A version of this minimum cost embedding 
problem has been studied in literature and we relate our cost model with the one present in literature. We present a quadratic 
integer program for the minimum cost embedding problem and its linear programming relaxation based on earthmover metric. 
Applying the randomized rounding techniques to the optimal solution of this LP we give approximate algorithms for some special 
class of graphs. We present constant factor approximation algorithms for maximum rate when CJ is a bounded width layered graph 
and when it is a planar graph with bounded out-degree. We also present 0(D log n)-approximation algorithm for arbitrary DAG 
Q where D is the maximum out-degree of a vertex in Q and n is the number of vertices in Af. We also prove that if a DAG has 
a spanning tree in which every edge is a part of 0{F) fundamental cycles then there is a 0(EZ))-approximation algorithm. 

Index Terms 

In-network computation, maximum computation rate, minimum cost of computation, MAX-SNP hardness, packing linear 
program. 


I. Introduction 

Consider a classical network application, like search, which requires the assimilation of source data available at various servers 
in order to generate the desired output at a particular server, called the sink. This requires the data to be transmitted over the 
network of communication links connecting the servers and computation of a function of this data. In-network computation 
enables the computation of partial functions of the data on the intermediate servers which may reduce the time (or cost, the 
number of transmissions) to get the final function value at the sink. This situation arises in various other network applications 
like query processing on a network, and information processing in sensor network, and has been studied extensively, e.g., ini, 
HD, lisi. In this paper we consider the problem of finding the communication and in-network computation schedule of a 
given arbitrary function of distributed data so as to maximize the rate of computation. We give an example to explain our 
problem below. 

Example 1. Consider a network M shown in Fig. [TJr with capacity of each edge being 1 bit/second. Each source vertex Si 
has an infinite sequence of one bit data {xi{k)}k>o. A sink vertex t wants to compute a function ft{k) of this data where 
the sequence of computation (Q) is shown by Fig. [TJ?. Figs. [7]: and d show two ways of computing ft on net. In Fig. [TJ: all 
intermediate functions are computed inside Af and ft is received at 1 bit/second by t. In Fig. [7]f only ws is computed inside Af 
and ft is computed at 0.5 bits/second rafe|3 Using both the implementations^ together, ft can be computed at 1.5 bits/second. 


A natural question to ask in this case is that given Af, Q which of all the possible embeddings to compute ft should one 
use to get the function at the maximum possible rate and how to schedule the data transfer over the communication links? 

Pooja Vyavahare and D. Manjunath are affiliated with the Bhai'ti Center for Communications. Their work has been partially supported by grants from DST 
and CEFIPRA. Pooja Vyavahare also received support from ITRA. Nutan Limaye is supported by grants from DST, DAAD and CEFIPRA. 

^As the communication link (a, £) is used to transmit both xi(fc),a: 4 (/c), each of them are received at rate 0.5 bits/second at t. 

^called as embeddings in this paper 
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Fig. 1: (a) Network graph (A/") (b) Computation schema (Q) for ft = xi{x2 + X3) + Xi{x2 + X3) (c) Implementation 1 
computing ft at 1 bits/second rate (d) Implementation 2 computing ft at 0.5 bits/second rate 


A. Maximum Rate Computation Schedule 

Recent interest in finding the maximum rate computation schedule is in the context of sensor networks and distributed 
computation schemes like MapReduce and Dryad. Computation of symmetric functions over multihop wireless sensor networks 
was introduced in HI and studied in several follow-up works, e.g., cni, m. More recently, ifTSll considered the computation 
of such symmetric functions over arbitrary wireline networks. The objective in the preceding works is, like in this paper, 
maximizing the computation rate. However, they restrict their attention to symmetric functions which allows them to perform 
the computation in an arbitrary order. Further, in lITOl . ifTsl the communication network is a random multihop wireless network 
and the results are for the asymptotic regime in the number of sources. While m considers wireline networks, they obtain 
outer bound on rate of computation. Authors in ifTSl also describe Steiner tree packing schemes that achieve rates which are 
are close to this outer bound by showing the approximation factor to be logarithmic in the number of source nodes. Another 
line of work, e.g., m, Gi, uses network coding techniques to maximize the rate of computation. We do not use network 
coding in our solution techniques. 

The closest to the work in this paper is that of 03, m both of which are interested in maximizing the computation 
rate of general functions over capacitated networks. In ll25l . the computation schema (Q) for computing the function / is 
assumed to be a tree. Tree structured Q allows the authors in ll25ll to obtain the optimum schedule via linear programs that 
preserve “functional flow conservation.” The functional flow conservation concept of ||25]| is also used in lfT9l when Q is a 
DAG to find the maximum rate of computation. They give a linear program to find maximum rate of computation and present a 
distributed algorithm to solve it using Lagrangian dual formulation but do not find the corresponding schedule. The functional 
flow conservation forces two restrictions on the computation schedule. Firstly, any function can be computed only once in 
M, and secondly, every edge of Q should be treated as unique function flow. □ These restrictions limit the class of allowable 
schedules which makes the rate achieved in 113 sub-optimal. 

The problem of collecting data at the sink from various sources can be represented by a tree structured computation schema 
Q where all the source nodes are at the leaves and are connected to the root (acting as sink) directly. Thus an optimal schedule 
to collect the data at sink can be obtained by using the techniques of ||25]| which runs in polynomial time in the size of 
input graphs. This implies that the problem of optimal data collection at a single sink is easy to solve. On the other hand, 
the problem of distribution of data from one source to multiple sinks has been studied earlier, e.g., Ol under the name of 
fractional Steiner tree packing problem. This problem is proved to be MAX SNP-hard iflTll . 

In this paper we consider the problem of finding optimal schedule when is a general DAG and there is only one sink 
node in the network. We first formalize the notion of a schedule to compute a function / over network M when Q is a DAG 
which does not have above mentioned restrictions. We define a routing-computing scheme (and the rate achieved by it) that 
computes / in a network (Section III-BI) . We show that finding an optimal routing-computing scheme is equivalent to finding 
the solution of a packing linear program of embeddings, which we call capacity achieving linear program (CALF) (Theorem [T] 
in Section HIIll. 

B. Relating Max Rate to Min Cost Problem 

Several measures of efficiency of in-network computation like the cost or delay in computation have been studied in the 
literature lIZTll . ||23 . These measures may be used when there is only one data value available with each source and the function 
is computed only once. This is also known as one shot computation of the function. In this case the edges of the network 

^The outgoing edges of vertex ojs in Fig. Uh are treated as different flows though they both represent the same function. 
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graph M do not represent capacities but have weights associated with them. The weight of an edge corresponds either to 
the delay incurred or the cost of transmission of a bit between two end points of the edge. The authors in IZTl prove that 
finding minimum delay embedding is NP-hard when 5 is a DAG and present a polynomial time algorithm when ^ is a tree. 
The problem of finding an embedding for one-shot in-network computation which minimizes the cost has been studied under 
various names in the literature, e.g., El, EH, ED. 

In this work we relate the complexity of finding the maximum rate schedule to that of finding the minimum cost embedding. 
Specifically, we prove that approximating CALP below a constant factor is NP-hard unless P=NP and even when the degree of 
each vertex and weights on edges of G are bounded and Af has just three vertices (TheoremEI- This is proved by considering the 
dual of this LP (Section HVl) . We prove that solving CALP is as hard as solving the separation oracle of its dual (Theorem [3l. 
The separation oracle is a decision problem which reduces to a version of the minimum cost embedding problem studied 
earlier for a different cost model in ETII (defined in Section IVI-Ab . Our cost model comes naturally from the definition of 
routing-computing scheme for finding the maximum rate (Example |4|i. We prove that our version of minimum cost embedding 
problem is MAX SNP-hard even when G has bounded degree, bounded edge weights, all outgoing edges of a vertex have the 
same weight and Af has just three vertices (Corollary [T]|. We compare our cost model with the one studied in literature ETl 
and prove that any algorithm which solves the minimum cost embedding problem of ETl gives a D-approximation for our 
version of minimum cost embedding problem (Theorem |6| where D is the maximum out-degree of a vertex in G- 

C. Approximation Algorithms 

As mentioned above, in Theorem El we prove that solving CALP is MAX SNP-hard even when there are only three vertices 
in Af. Hardness for solving CALP for any network Af with less than three vertices is of theoretical interest. Thus, we first 
present a polynomial time procedure to solve CALP on Af with two vertices for an arbitrary DAG G (Section |V]i thus proving 
the dichotomy of hardness of CALP. 

In Section|VT]we present a restricted class of schedules by studying a restricted class of embeddings, called R-Embedding. We 
present the equivalent packing LP for these embeddings called R-CALP and observed that our hardness results (Theorem [3 and 
Theorem EJ also hold for this class of schedules. We use the procedure of Theorem E] in Section |VT] to present approximation 
algorithms for R-CALP. Using the relation derived in Theorem |6] between different cost models and the result of 1161 we show 
that there is no polynomial time constant factor approximation for R-CALP (Corollary EJ unless NP C DTIM 
when G has unbounded degree and edge weights. Here p is the number of vertices in G- 

Since the problem for general G is NP-hard, we consider some specific structures of G to get approximate algorithms. 
Many of the well known functions like fast Fourier transform (EFT), sorting or any polynomial function of input data can be 
represented by a layered computation graph. We present a constant factor approximate algorithm for R-CALP when the width 
of each layer of the layered computation graph is bounded (Corollary |4|i. Then we consider a class of G that has a spanning 
tree such that any edge is a part of at most 0{F) fundamental cycles. For a N point FFT computation graph F = log(At). 
We present a polynomial time 0(FZ?)-approximation algorithm to solve R-CALP for such graphs (Corollary EJ- Lastly we 
formulate the minimum cost embedding problem as a quadratic integer program and present its linear programming relaxation 
based on earthmover distance metric (Section IVI-Cl l. Applying the randomized rounding techniques to the optimal solution 
of this LP we present two algorithms (derived from El) to approximate R-CALP. The first algorithm gives an 0{D\ogn)- 
approximation for general G (Corollary |6ll and the second algorithm gives an 0(D)-approximation for planar G (Corollary El 
where n is the number vertices in Af. 


H. Notations and Problem Definition 

A communication network is represented by an undirected graph Af = (V, F), where V = {ui, ..., Un} is a set of network 
nodes and F is a set of communication links (see Fig. Eb for an example of Af.) Each link has a non-negative capacity 
associated with it. Let {si,S 2 ,... ,Sk} C U be the set of k source nodes with Si generating an infinite sequence of data 
values from the alphabet Ai. The sink node t needs to compute function / : {Ai x ^2 x • • • , x^„} 1 — At. The schema 
to compute / is given as a directed acyclic graph G = where fl is the set of nodes representing a computation of an 

intermediate (with respect to /) function of the data and L is the set of edges denoting the communication of these functions. 
Let {a;i,a; 2 ,..., Wk} C H be the source nodes and ujp be the sink that receives /(•). See Fig. Eb for an example of G- 
Let {xi{k)}k>i be the infinite sequence of data values at source Si. We assume that the entire sequence is available at Si all 
the time. Let ft{k) := f{xi{k), ... ,a;„(fc)). Our interest in this paper is in the computation and communication schedule in 
Af that will obtain ft{k) at sink node t at the maximum rate. The source nodes of G have in-degree zero while out-degree of 
sink node uip is zero. All the other nodes G have in-degree greater than zero and out-degree greater than zercfl The direction 
on the edges in G represents the direction of the data flow. Without loss of generality we assume that all the outgoing edges 
of a node represent the same intermediate function. Let Lg be the set of all edges carrying the intermediate function 9 and 
let Ae be its (finite) alphabet. Let 0 be the set of all intermediate functions in (y, let ui : 0 1 —be the weight of each 
intermediate function in G with w{9) = [log(|^6»|)l- 

“^If the out-degree of all the nodes (except the sink node which has out-degree zero) is strictly one then the graph (7 is a tree structure. 
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Fig. 2: (a). Network graph (A/”) Number near an edge shows its capacity in bits/second (b). Computation graph {Q) for 
/ = (xi + X 2 X^){xi + X 2 X^) (c). An embedding of function f on N (d). Another embedding £2 to computer / 


Remark 1. Each outgoing edge of any vertex w C carries the same function, the weights associated with all the outgoing 
edges of a given uj are the same. 

A path in J\f is denoted by a sequence of distinct vertices a = {ui,U 2 , ■ ■ ■ ,ui), such that {ui,Ui+i) € E yi < i < I — 1. 
The nodes ui and it; are called the start node (start(cr)) and the end node (end(cr)) of the path a respectively. A path can 
be of zero length in which case a = (iii) is a single vertex and start and end nodes are the same. E is the set of all 
paths in A/". For 7 G F let tail( 7 ) and head( 7 ) represent the head and the tail of the edge 7 respectively. Let $-|-( 7 ) and 
$^^( 7 ) denote, respectively, the immediate predecessors and successors of 7 , i.e., $-(-( 7 ) = {o^ C r|head(a) = tail( 7 )} and 
$^^( 7 ) = {a G r|tail(a) = head( 7 )}. For a function 0 G 0, let A-|-(0) and Aj^(0) be the functions carried by the predecessor 
and successor edges of Fg. 

A. Embedding Definition 

Informally an embedding of ty on A/^ gives a way of computing f on JV as per the data flow given by Q. Thus, an embedding 
of ty on AA can be seen as a function which maps an edge 7 G F to paths in TV" where the the function carried by 7 is computed 
at the start node of the path and at the end node of the path it is used to generate its successor function. This is formalized 
in the following definition. 

Definition 1 (Embedding). An embedding of G on AT is a function £ : T 1 -^ 7^(E)0 If £{'yi) := {a{,..., a^} then the edge 
7 ; is mapped to r paths such that the following properties are satisfied. 

1) //tail( 7 ;) = G [1,k] then start(cr^) = Si Vtr^ G £{^ 1 ). 

2) If head( 7 ;) = ujp then end(CT„) = t Va^ G f ( 7 ;). 

3) If ji € $ 4 .( 7 j) then there exists a al such that end{al) = start((T^) Vcr*. Similarly, for every there exists a cr* such 
that end((T^) = start(cr^). 

4) There are no i,j G [l,r] such that i j and end(cr;) = end(crj) V 7 ; G F. 

5) ^start(CT-) f start(CTj) Vi j € [l,r] then a\ fl ct j = 0 V 7 ; G F. 

Above mentioned properties of a valid embedding are a direct consequence of the structure of G which are explained in 
Appendix lAl 

Example 2. Consider Af = (V,E) as shown in Fig. |2}j. Assume that each source generates symbols from A = {0,1} 
and the alphabet of function f is also A A schema G to compute the function / is shown in Fig. \2i>. Assume that all 
the intermediate functions are also from A hence w{9) = [log(2)] = 1 for all 0 G 0. Two of the (multiple) possible 
embeddings are shown in the Fig. la and d. For the embedding shown in Fig a fi(7i) = Six, £ 1 ( 72 ) = S2X,£i{yf) = 
S 3 X,£’i( 74 ) = S4yz,£’1(75) = x,fi(76) = xz,£l('y^) = xz,£i('yA) = z,£i{yf) = zt. For the embedding shown in Fig^, 
£■2(71) = Six,f2(72) = {S2X, S2?/},f2(73) = (ssX, Say}, £2(74) = S4y, £2(75) = X,f2(76) = ^,£^2(77) = XZ, £2(78) = yz , 
£■ 2 ( 79 ) = zt. 

Observe that if an edge 7 ; is mapped to two paths, say a{ and tr^i then the same symbol of the function carried by it is 
generated twice; once by the vertex start(CT}) and once by vertex start((T 2 )- We denote the set of all the embeddings of G 

^Here 'P(S) denotes the power set of S except the empty set. In an embedding an edge may get mapped to a path of zero length, which implies that both 
its end points are mapped to the same vertex. 
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on M by E. As observed in Example |2] an edge in M can either carry zero or more function types in an embedding. Let 
■= S £ ^(70 7 i € Eg} be the indicator function of the transmission of function type 6 over an edge 

e £ E. Then total number of times an edge is used in £ is r£{e) := 

See 

Remark 2. An edge e in M can be a part of embedding of more than one edges of Q all of which carry the same function 
9. In this case we say that the edge e is used only once (observe r®(e)) since the edges carry the same function. 

The notion of an embedding of ^ on A/^ to compute / is used in IT^ . ||25l. The key difference between these and this paper 
is that in the former, an edge in Q is mapped to only one path in ff. This is not a restriction when ^ is a tree, like in ll25ll . 
However, it does reduce the maximum rate when ^ is a DAG as demonstrated by the following example. 

Example 3. ITe continue with Example \2\ here. Observe that in £2 (shown in Fig. |2jfj the function 6 ^ is computed at two 
vertices x and y and used to compute 9 q at x and 9-j at y. The source 82 sends the function 62 on S 2 X, S 2 y and S 3 sends 63 
on S 3 X, Ssy. If the capacity of links S 2 y and S 3 J/ are used completely the final function f can be computed at the rate of 1 
bits per second using £ 2 . As each edge in M is used only once, r£^(e) = I'ie £ E. 

Note that after the usage of edges by £2 residual capacities on the edges of Nf are: c{six) = 0.5, c(^S 2 x) = 0 . 5 ,c(s 22 /) = 
0 ,c(s 3 x) = 0 . 5 ,c(s 3 y) = 0 ,c(s 4 y) = 0.5,c(a;z) = l,c(yz) = 0.5 and c(zt) = 0.5. These residual capacities can be used by 
£i (shown in Fig^^;} to generate the function f at rate 0.5 bits/second. Note that for all the edges used by £\, = 1 

except for xz for which r£j(a;z) = 2. Using both the embeddings, the sink t can receive f at the rate of 1.5 bits/second. 

B. Communication and Computation Model 

We saw that an embedding of ^ on A/^ specifies which function 9 is generated at which vertex and transmitted over which 
edge in the network. However, this does not specify the exact schedule for computing each 9. Our task is to not only give an 
embedding but also give a full schedule. For this we define the notion of routing-computing scheme. 

To define the scheme formally, we first mention the assumptions on the computation of functions and the allowed set 
of communication events in the network graph. Let X denote the vector [xi,... ,Xk\, and its fc—th realization be X(A:) = 
[xi{k),..., XK.{k)]. The time is slotted and in each time slot an edge e = (u,v) £ E is said to be activated if some information 
is transferred from u to v. All the edges can be activated simultaneously in any time slot. If the capacity of an edge e is c(e) 
then at most [c(e)rj bits can be transferred over it in T time slots. We assume that any vertex u transmits all the bits of the 
A:-th realization of function 9 on the edge e as a single packet of w{9) bits. Any rt G 17 at time slot r may perform one of the 
following tasks exclusively. 

1) Computation event: if there exists t' < t such that the k-th realization of the predecessor functions of 9 are received or 
generated by u then it can generate the fc-th realization of 9. 

2) Communication event: if there exists t' < t such that the /c-th realization of a function 9 was either received or generated 
by u then it can transmit it over one of its outgoing edges, say {u,v). 

3) Receive a function from an incoming edge or do nothing. 

We assume that any computation event in the network can happen instantaneously and the time is taken into consideration only 
for communication events (which is dictated by the capacity of network edges as mentioned above). Any routing-computing 
scheme can be considered as a sequence of L events Ri,l < I < L where each event is one of above mentioned tasks. It 
computes K symbols of / at the sink in time t by using K fixed block of source symbols indexed by 1,2,..., K. The rate 
of computation of / by the routing-computing scheme is then defined as K/t. At any time t < t, a node can have, a subset 
of the universe of data U = 0 x where an element {9,k) £ U denotes the fc-th symbol of the function 9. The sets 

ldu,i,b(u,i+i C U represent the state of a node u before and after the Z-th event Ri respectively. In the case of a computation 
event the state of only u is changed, and for a communication event only the states of vertices u and v are changed. As seen 
in Example 121 a symbol of a function can be computed multiple times in the network and the scheme presented here takes this 
into account. Let f. be the number of times the fc-th symbol of 9 is used or transmitted by u in the overall scheme. We 
remind you that when is a tree, each function symbol is computed only once in the network and the corresponding scheme 
is presented in ll25l . 

Definition 2. A ({A^e|e £ E},K,m^ j.) routing-computing scheme for {J\f,Q) given L £ N'*", subsets {llu,i (= ll\u £ V,l £ 
[1,L 1]} and Vm, k, 9 : to® ^ £ N'® is: 

1) For 1 <i < K, Usi^i = {i9i, k)\k £ [1, K]}, Uu,i = 0 Vu G 17 \ {si|l < i < k}. 

2) For each I < L -\- 1, one of the following holds. 

a) Computation event: In this event a node u computes a function 0(X(fc)) using { 77 (X(A :))|?7 G A'f-(0)}. More precisely 

we first set to® ^ = to® ^ — 1 Vp G A'f-( 6 *) and Z{Uu,i) := {( 7 , k) £ ^ = 0}. Then the data-sets are updated 

as follows: Uu,i+i = {(0, k)} CUu,i \ Z{Uu,i);Uy^i+i = Uy,i, Vu G 17 \ {u}. 

b) Communication event: In this event a function 9{'K{k)) is transmitted on the link uv. More precisely we first 

set j. — 1 and Z{LIu,i) '■= {{ijk) £ W„./|to®^ = 0}. Then the data-sets are updated as follows: 

ldv,l4-l — U {^(9 , k)^ ,L{u,l3-\ — biu,l \ Z (lAu.l) Vtu U, V. 
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c) Final condition: Ut^L+i = {{f,k)\l< k < K};h{u,L+i = 0 Vm 7 ^ = 0 Vm G F, fc G [1, K],9 £ Q. 

d) Total link usage: Let r® be the number of times a function 0 is transmitted over edge e G Af. Then the total link 
usage is given by: Ne = 

The scheme uses an edge e £ E for Nf./c{e) time slots to compute K symbols of / at the sink. 

Definition 3. For a given network M, {c(e) |e G E}, and a computation graph Q, a rate A is said to be (Af, Q)-achievable if for 
every e > 0, there is a {{Ne\e £ E},K,m^ j.) routing-computing scheme for {Af,G) such that Ne{X — e) < Kc{e), Ve G E. 
The supremum of {Af, Q)-achievable rates over all the routing-computing schemes is called the computing capacity for (Af, Q), 
and is denoted by C{Af, G)- 0 

Example [3 presented in Section III-AI shows that using multiple embeddings and sequencing them appropriately we can 
achieve a higher rate of function computation than by just using one embedding. In the next section we give a (packing) linear 
program for obtaining maximum rate of computation using a combination of different embeddings and show that this also 
achieves the computing capacity C{Af,G). 

III. Capacity Achieving LP (CALP) 


Capacity Achieving Linear Program (CALP) 
Objective: Maximize R := subject to 

1) Capacity constraints: ^ 

2) Non-negativity constraints: x{£) >0, Vf G E. 


Theorem 1. For a given network Af and computation DAG G, CALP achieves a rate R which is equal to the computing 
capacity (C{Af,G)) for {Af,G)- 

Proof: We prove the theorem in two steps. First we show achievability, i.e., we show that for any {x{£)\£ £ E} that 

satisfies the constraints of the CALP the rate X ^{f') 5)—achievable. Next we show that for any ({A"e|e G E}, K, /fj 

£eE 

routing-computing scheme for {Af,G) satisfying NeX < Kc{e), \/e £ E there exists {x{£)\£ £ E} satisfying the constraints 
of the CALP such that X Authors in ll25l defined routing-computing scheme works only for tree structured G 

£eE 

where any intermediate function is computed only once in the network and showed its equivalence to the corresponding CALP 
using similar arguments. 

Step 1 of the proof: In this step starting with a set of embeddings which satisfies the CALP constraints we generate a 
routing-computing scheme which achieves the sum rate of these embeddings. Let {x{£)\£ £ E} be the number of symbols 
of function / generated by various embeddings such that it satisfies the constraints of CALP. Since the rational numbers are 
dense we can find a set of rational flows {x'{£)\£ £ E} such that X£eE®^(^) — X£ge^(^) ~ ^ e > 0. We denote 

the least common multiple of the denominators of {x'{£)\£ G E} by d. Let us take K = dXceE For every edge e £ E 

let Nf. = rfX£eE^^(®)^^(^)- embedding tells us where any function is computed in the network and on which edges it is 
transmitted. Let L{£) = X X denote the number of symbols of different functions transmitted in the embedding £, 

eeE eeo 

where r|(e) is the indicator variable for the transmission of function type 6 over edge e in embedding £. Similarly let g£{9) 
be the number of times a function 9 £ {0 \ {Xu\i £ [1,«]}} is computed under the embedding £. More formally, 

9d^) ■= l{start(cri) 7 ^ start(crj)|Vcri G £{'yi) and aj £ £(j 2 )}l!l 

7i>72^r6> 

The total number of computations of all the functions in £ is g(£) := X 9s{9)- 

eee 

Now we will construct a routing-computing scheme with the following properties. 

1) It computes K = {£) realizations of the function with dx'{£) realizations computed by embedding £. 

2) It uses any edge e to communicate Wg = d X T£{e)a:'(£^) bits, where rsie) = X 

£eE 060 

3) It has L = dX^GE+ t^XfGE(^) events out of which the number of communication events is 

d^g^^L{£)x'{£) and dX£:GE 5 ('^)*^(^) '■F® computation events. 

Note that for this routing-computing scheme We(X£:gE^(^) ~ ^) If -^e X^ge ^ solution of the CALP 

it satisfies the capacity constraints thus 

r£{e)x'{£) < c(e) Ve G E. 

£GE 

®A similar definition appears in E), however in their case ^ is a tree. 

^Note that in the above equation we need to consider all the values of 71 and 72 including 71 = 72 and the generation of source sequence Xu is not 
considered as a computation in the embedding. 
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Using the values of and K for this scheme we get, Ng = d r£{e)x'{£) < dc{e) < wg) ■ Thus the routing-computing 

scheme satisfies -^e(X]£ 6 E ^ ^ This guarantees the achievability of the computing 

rate '^£^^x{£). We now show the sequencing of communication and computation events in the routing-computing scheme. 

For this we first compute a total ordering r on the vertices and edges of the computation DAG using the underlying DAG 
ordering. Using this ordering one can inductively order the vertices and edges of the network graph N which are used in an 
embedding £. Note that every vertex and edge of M used in £ has a function 9 associated with it and the total number of edges 
(for transmission) and vertices (for computation) used by it are L{£) + g{£). We denote the ordering (and the corresponding 
function) generated by an embedding £ by 

(/)£:[!: L{£) + g{£)] (U x 0) U (F; x 0). 

Now we find the total number of times a function 9 being used or transmitted by a vertex u in the network in an embedding 
£ as follows. 

v&V ?;eA4,(6») 

We define the sets lAu,i QU\ \/u gV and VZ € [1,L+ 1] below in an inductive fashion. 

1) For 1 <i < K, Usi,i — {{9i, k)\k G [1, K]}. And Uu,i = 0 for all m G U \ {si|l <i < k}. 

2) Let us fix an arbitrary order on the embeddings, say fi,£’ 2 ,... ,f|E|- Recall that the *-th embedding generates dx'{£i) 

number of function symbols. We describe the procedure for the j-th symbol generated by Lth embedding. The same 
procedure is run for each symbol of every embedding by following the order of embeddings. Set ^ (fi) for all 

9 G Q. The scheme for this j-th symbol produced by i-th embedding has L{£i) + g{£i) number of events. We give the 
procedure for the Z-th event of this symbol inductively by assuming that all the events till the generation of (j — l)-th 
symbol by £i and (Z — l)-th event of j-th symbol are right. Then at the Z-th event do one of the following. 

a) If (j>£i{l) = {u, 9), then the Z-th event is a computation of 9 at u. The condition A-|-(0) C Uu,i{k) holds because of 

the assumption of the correctness of the earlier steps. We set f. = — G A-|-(0) and Z{Uu,i) '■= {( 7 , k) G 

^u,i\^Z,k ~ 0}. The data-sets are redefined as follows: = {9, k}Uh{u,i\Z{h{u,i),£(v,i+i =£{v,i, Vu G U\{u}. 

Note that this is in accordance with the condition 2(a) of Definition |2] 

b) If ipSiil) = i{u,v),9), then the Z-th event is a communication of 0(X(fc)) from u to v over the edge {u,v). 

{(j)£.{n), k) C lAu,i{k) holds because of the assumption. We first set ^ — 1 and Z{Uu,i) ■= {( 7 , k) G 

Uu,i\ml^ /c = 0}- The redefine the data-sets as follows: Uu,i+i = Uu,i \ Z{Uu,i), and Uy^+i = U {(0, k)}. For 
any w ^ u,v, Uw,i+i = Uw,i- Note that this is in accordance with the condition 2(6) of Definition |2] 

It is easy to verify by running the above procedure inductively the final conditions, Ut^L+i = < k < K},14 u,l+i = 

%\/u ^ t and TO® = 0 Vu, k, 9 are met. Similarly the link usage Ng = all e G FZ is also satisfied, where 

rf = |{Z G [1,F] : Z is a communication over e for function 0}|. 

Step 2 of the proof: Now we prove that for any ({We|e G to® ^) routing-computing scheme for (A/", (?) satisfying 

NgX < Kc{e), ye G E there exists {x{£)\£ G E} satisfying the constraints of CALF such that ^ x{£) = A. 

£eE 

In any routing-computing scheme looking at the communication and computation events corresponding to the fc-th symbol 
of all the functions one can easily get an embedding. Let us say that for the fc-th computation the scheme uses embedding 
£ik) g E por each e G E, the fc-th computation requires communication of r®(s,)(e) bits over e of function type 9. Usage of 

the link e by the embedding can be computed by r^(fc) (e) = ^ r^^^k)ifi)w{9). Thus the total link usage by the scheme 

eee 

can be written as 

K 

'^r£(k){e) = Ng^e G E. (1) 

k=l 

Let x{£) := g e. Note that by definition, x{£) > 0 and ^ x{£) = A. Equation ([TJ can be written as 

£eE 

=£\r£{e) = Ng 

£eE 

X] Kx{£)r£{e) = XNg < Kc{e) 

£GE 

'^x{£)r£{e) < c(e) 

£GE 




So, {x{£)\£ € E} satisfies the conditions of the CALP. Thus we get a solution of CALP with 
routing-computing scheme. 


x{£) = A from the 

£eE 


IV. Complexity of CALP 

In this section we prove that solving CALP is MAX SNP-hard even when Q has bounded degree and bounded edge weights. 
We first prove that if there is an a-approximation for CALP then there is an a-approximation algorithm for minimum cost 
embedding problem. We give a linear reduction from SIMPLE MAX CUT to the problem of finding minimum cost embedding. 
Because SIMPLE MAX CUT is a MAX SNP-hard problem, we get the following theorem. 

Theorem 2. For a DAG Q and arbitrary Af solving CALP is MAX SNP-hard even when: (1) Each vertex of Q (except for the 
sink) has bounded ('O(l)) degree. (2) Every edge of Q has bounded (0(1)) weight. (3) All the outgoing edges of a vertex of 
Q have same weight. (4) The network graph Af has only three vertices. 

Proof Outline: We give the reduction in several steps. The outline of the proof is as follows. 

1) We hrst consider the dual of CALP and its separation oracle which is a version of the problem of finding the minimum 
cost embedding. 

2) We then prove that there is an a-approximation for CALP if and only if there is an a-approximation for the separation 
oracle of its dual. This implies that if minimum cost embedding problem is hard to approximate beyond some factor then 
finding the maximum rate of computation is also hard to approximate. 

3) Next we prove MAX SNP-hardness of by reducing SIMPLE MAX CUT problem to minimum cost embedding. We use 
a series of gadgets to obtain the desired properties of the computation graph Q. 

A. Step 1 of the proof 

First we consider the dual of CALP which is presented below. Recall that E represents the set of all possible embeddings 
of ty on AA and rgi^e) represents the number of times an edge e € E is used by the embedding £. 


Dual of CALP 

Objective: Minimize C = J2e&E c(e) 2 /(e) subject to 

1) Cost constraints: ^£(e)y(e) >1, Vf € E, where r£(e) = 

eeE 

2) Non-negativity constraints: y{e) > 0 Me € E. 


Note that rgi^e) can be computed given the embedding £. Given a vector {x(e)|e € £’} the total cost of an embedding can 
be defined as: 

C'(^) — '^rde)x{e) = ^ I 

eeE eeE V^eO / 

Observe that for any given solution of the dual of CALP, {y{e)\e € E}, a cost constraint corresponding to an embedding 
£ is C{£) > 1. Let us now look at the separation oracle of the dual of CALP. 

Definition 4 (Separation oracle of Dual of CALP). Instance: A network graph Af, a computation DAG Q, weight function 
{w(^9)\9 G 0} and a vector {y(e)|e G E}. Output: If C{£) > 1 M£ G E, then output “yes” else output “no” and an embedding 
£ such that C{£) < 1. 

Note that to solve the above problem, it suffices to compute the minimum cost embedding of Q on Af. A version of minimum 
cost embedding problem has been studied in ll27l . We formally define this cost in Section |VI] and then derive its relation to the 
cost defined in Equation (|2|i. In the next section we prove the relation between CALP and the problem of hnding minimum 
cost embedding of Q on A!. 

B. Step 2 of the proof 

In this section we prove the equivalence between the the problem of solving CALP and the separation oracle of its dual, 
which is to find the minimum cost embedding. In the process we present a procedure to find a solution of CALP if we have 
an algorithm to solve minimum cost embedding problem. This will be used in Section |VT] to approximately solve CALP. 
Specifically we prove the following theorem. 

Theorem 3. There is a polynomial time a-approximation algorithm to solve CALP if and only if there is a polynomial time 
a-approximation algorithm for finding the minimum cost embedding of Q on Af. 
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Proof: The arguments to prove the theorem are similar to the one presented in Theorem 4 of lfT4ll where they consider 
a packing Steiner tree LP. The main difference between their packing LP and our LP is that in their case the coefficient of 
the dual variables {y{e)\e € E} are 0/1. In our LP (the dual of CALP) the coefficient is rg/(e) which could be any positive 
number depending on the embedding £'. 

In the forward direction starting from an a-approximation polynomial time algorithm, say A, for the minimum cost embed¬ 
ding we give an a-approximation polynomial time algorithm to solve the CALP. First we add the inequality )y{e) < R 

in the constraints of dual of CALP and using ellipsoid algorithm and binary search (over various values of R) we find the 
minimum value of R, say R*, for which the dual is feasible. We use the algorithm A for the separation oracle of dual while 
running the ellipsoid method. The separation oracle works as follows: First for a given set of {y{e)} it checks the inequality 
The&E c(e) 2 /(e) < R. If this is true then it uses algorithm A to find the minimum cost embedding £ of cost C{£). If C{£) < 1 
then we know that {y{e)} is not a feasible solution of the dual and £ gives a separating hyperplane. But if C{£) > 1 then 
{y{e)} is considered to be a feasible solution and the corresponding dual (with the added inequality) is considered feasible. 
Since algorithm A is an a-approximation of the optimal minimum cost embedding we know that the above conclusion might 
be incorrect and the dual might indeed be infeasible. However, in this case {ay(e)} gives the feasible solution with R replaced 
by aR. Note that, this is possible because the right hand side of the cost constraints is all 1 in the dual. Therefore if R* is 
the minimum value of R found feasible by the ellipsoid method then we know that the optimal solution of dual lies in the 
range between R* and aR*. Thus by strong duality of linear programming this method gives us a approximation value of the 
solution of CALP. 

To find the actual solution corresponding to this value, i.e., {x{£)\l£ € E'} we do the following: We know that the ellipsoid 
method ends in polynomial time giving polynomially many separating hyperplanes to reach to the a-approximate solution. 
These hyperplanes are sufficient to show that the solution of dual is atleast R*. Corresponding to each of these hyperplanes in 
the dual there is a variable in the primal CALP. If we set all the other variables to zero then we get a polynomial sized version 
of CALP whose solution is at least R*. This version of CALP can be solved in polynomial time giving the a-approximate 
solution {x{£)} of CALP. This completes the forward direction of Theorem |3] 

In the reverse direction we start with an a-approximate solution, say {a:(f)}, of CALP and find an a-approximate 
minimum cost embedding. Recall that the objective function value corresponding to this is Xsoi = LP-duality 

we know that Xsoilct is an a-approximate value of the optimal of dual of CALP and xsol jet = c(e) 2 /(e). We set each 

y{e) := to get the corresponding solution (possibly infeasible) of the dual of CALP. 

If P is the polytope defined by the constraints of dual of CALP then we define its polar by P* := {z\{z,y) > l,Vt/ G P}. 
It is easy to observe that if we can find an approximate solution over P then we can approximately solve the separation 
oracle problem of P* and (P*)* = P, Using the a-approximate solution {y{e)} found above we get a-approximate separation 
oracle of P* . Using the ellipsoid method mentioned in the forward direction of the proof and this separation oracle we get 
an a-approximate solution on P* . As (P*)* = P this solution over P* gives an a-approximate separation oracle of P which 
is equivalent to approximately solving the minimum cost embedding problem. In this case also as the right hand side of the 
edge constraints are all 1, the approximation ratio is preserved. 


C. Step 3 of the proof 

In Section IIV-BI we showed that solving CALP is equivalent to solving minimum cost embedding. In this section we reduce 
a known NP-complete problem, SIMPLE MAX CUT to the minimum cost embedding problem thus proving that solving 
CALP is NP-complete. 

A SIMPLE MAX CUT problem is defined as follows: Given an unweighted graph H = {Vh,Eh) and a number K, check 
whether there is a partition of Vh into two sets Ui and V 2 such that there are at least K edges between Vi and V 2 . Moreover, 
it is known that if the input graph of SIMPLE MAX CUT problem is a cubic graph @ then the problem is MAX SNP-hard 
a. We start with an instance of SIMPLE MAX CUT with cubic graph and prove the MAX SNP-hardness of minimum cost 
embedding problem. 

Given an instance f = {H, K} of SIMPLE MAX CUT where iJ is a cubic graph, we generate an instance of minimum cost 
embedding problem ip = {Q, Sg,ujp,w;JV, S_\f,fy)- Recall that ff = {V,E) is the network graph with SV C V sources, t 
as the sink and y as the weight function on E. Similarly, Q — (fl,r) is a computation DAG with Sg as sources, utp as the 
sink and w as the weight function on F. 

Theorem 4. For an instance f of SIMPLE MAX CUT we construct an instance if) of the minimum cost embedding such that 
(j) has a cut of size at least K if and only if ip has an optimal embedding of cost at most 28 \Eh\ — K. 

Proof: First we create an undirected network graph. We consider TV" to be a complete graph on three vertices with 
V = {S'!, S 2 , f}. We set SV = {<S'i, 82 } as the sources and t as the sink vertex. We set the weight y(e) = 1 \/e € E. 


'A graph in which each vertex has exactly degree three is called cubic graph. 
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Fig. 3: (a) Gadget for edge {x,y) in H. (b) Redrawing the gadget shown in (a) with new vertices for each outgoing edge of 
Sixy Numbers near the edges represent their weights and the unlabeled edges have weight 1. 


U U 




Fig. 4: (a) A vertex in I with h outgoing edges (b) Gadget to replace the vertex shown in (a). Labels near the edges represent 
their weights and z = hmax{li,... ,lh) + 1 


Now we create the computation graph Q from H using a series of gadgets each of which enables the desired properties on 
Q as follows: We start with the gadget shown in Fig. Oa) for each edge {x,y) G Eh- This gadget is used to prove the MAX 
SNP-hardness of Multiterminal cut from SIMPLE MAX CUT in ||8l. We direct readers to [SI for more details of this gadget. 
Note that each Sixy is connected to four vertices with edges of weight four. We create four vertices for each Sixy (one for 
each of its one outgoing edge) and connect one of its neighbor of Sixy to exactly one of these newly created vertices. We put 
the directions on the edges of Fig. [3a) such that all the edges from Sixy are outgoing edges and ujp has all incoming edges. 
The resulting gadget is shown in Fig. jUb). It is easy to observe that Fig. |3b) is just a redrawn directed version of Fig. |3a) 
with a separate vertex for each edge of Sixy We denote the graph formed by replacing each edge of Eh by the gadget of 
Fig. |3b) by I. Finally, we replace every vertex of I, with multiple outgoing edges, by the gadget shown in Fig. |4| 

We set all the vertices of type S*^y as sources, i.e., Sg = {S*^y\x, y G Vh, i G {1, 2}, * G {x, y, a, b, c, d}} and the sink is 
ujp. From each edge gadget we get eight sources thus |5'p| = 8 \Eh\- Similarly, the sink vertex Wp has incoming edges. 

Observe that graph Q has the following properties. 

Lemma 1. The DAG Q created from an instance f of SIMPLE MAX CUT has the following properties: (1) All the vertices 
in Sg have only outgoing edge and the sink vertex ujp has only incoming edges. (2) All the intermediate vertices in Q have 
atleast one incoming and one outgoing edge. (3) There are no directed cycles in Q. (4) Out-degree of each vertex is bounded. 
(5) Weight on each edge is bounded. 

Proof: The proof directly follows from the gadgets. Details of the proof are presented in Appendix iBl ■ 

Recall that the network graph generated from SIMPLE MAX CUT has only three vertices. We assume that each source 
vertex of type in Q is generated at S*! G V. Similarly, each source of type S^^ is generated at S 2 . The sink vertex ujp G Cl 
is mapped to t gV. This completes the generation of an instance ijj from f of SIMPLE MAX CUT. 

Before we start proving Theorem [4] we prove some properties of the gadgets of Figs. [3 E) We say that an edge of Q is 
exposed in an embedding if its weight is considered while computing the cost of the embedding. 

Lemma 2. In the minimum cost embedding of Q on M, any edge of weight z is never exposed from the gadget of Fig. ^b). 
Lemma 3. If a W way multiterminal cut (with terminals being Sixy, S 2 xy,^p) of the gadget shown in Fig. \^a) has weight 
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W then there is an embedding of the gadget of Fig. \^b) (along with the Fig. ^b)) of cost W on Af. 

Using Lemma[3 we can borrow the following result from Lemma 4.1 of IS] for the 3-way cut of Fig. [Sja). Refer to IS) for 
more details. 

Lemma 4. There are embeddings of the gadget of Fig. \^b) (along with Fig. ^b)) on M with the following properties. 

1) There is an embedding with cost 27 in which x,axy are mapped to Si] y,bxy to S 2 and Cxy,dxy to t. Similarly, there 
is an embedding with cost 27 in which y,dxy are mapped to Si] x,Cxy to S 2 and axy,bxy to t. 

2) Any other embedding in which x is mapped to Si but y is not mapped to S 2 or vice a versa has cost strictly greater 
than 27. 

3) Moreover, there are embeddings in which x,y both are either mapped to Si or S 2 have cost exactly 28. For example, 
an embedding in which x,y,axy are mapped to Si] b to S 2 and Cxy,dxy to ujp has cost 28. Similarly, an embedding in 
which X, y, Cxy are mapped to S 2 ] dxy to Si and Oxy, bxy to tOp has cost 28. 

And finally we need the following lemma to prove Theorem |4] 

Lemma 5. Given any embedding S with cost C(E) of Q on M in which a vertex of Q is mapped to multiple vertices of Af 
we can obtain an embedding £' in which no vertex of Q is mapped to more than one vertex of Af and has cost C(£') < C(£) 
in polynomial time. 

Proofs of all these lemmas are presented in Appendix iBl 

Proof of forward direction (Theorem |4): We need to prove that if there is a SIMPLE MAX CUT of graph H of size at 
least K then there is an embedding of cost at most 28|i?^f | — K of Q on Af. Suppose there is a partition of Vh into sets Vi, V 2 
such that the number of edges between them is at least K. Then we create an embedding of ^ on A/^ as follows: Map all the 
vertices of Vi to and V 2 to £ 2 - Thus for every edge gadget x, y are either mapped to Si or S 2 . If x, y both are mapped to 
different Si,i S {1,2} then map the intermediate vertices of this gadget according to the embedding of Lemma |4] point 1 and 
if they are mapped to the same vertex then use the embeddings given in point 3 of Lemma |4] Specifically, if x, 1 / are in the 
same set in the SIMPLE MAX CUT then the gadget will contribute 28 to the cost of the embedding else it will contribute 27. 
As there are at least K edges across the cut, the total cost of the embedding of (y on A/" is at most 28|i?y/| — K. 

Proof of backward direction (Theorem |4); Now we need to prove that if there is a minimum cost embedding of cost less 
than 2S\Eh\ — K then there is a cut of size at least K for H. From Lemma |5] we know that the minimum cost embedding 
maps every vertex of Q to only one vertex of Af. For each edge {x,y) G Eh we know from Lemma |4] (point 2) that the cost 
of the embedding from its gadget is > 28 unless x, y (or y, x) are mapped to S 2 (or £ 2 , Si) respectively. If the cost of the 
embedding is less than 2S\Eh\ — K then there must be at least K edge gadgets in which x, y (or y, x) are mapped to Si, S 2 
(or S 2 , Si) respectively. To get a cut of H from this embedding we take {x\x G Vh} which are mapped to to be in Vi and 
the vertices which are mapped to S 2 to be in V 2 . The vertices of Vh which are mapped to ujp are arbitrarily put in the set Vi 
or V 2 . By our earlier arguments there are at least K edges between Vi and V 2 thus giving a cut of size at least K. ■ 

We now show that the reduction presented in Theorem |4] is indeed a linear reduction thus proving the MAX SNP-hardness 
of the minimum cost embedding problem ID. We just showed that an instance f of SIMPLE MAX CUT with optimal value 
opt((/)) can be converted into an instance ip of minimum cost embedding problem in polynomial time such that opt('0) < 
2S\Eh\ — opt(^). Note that for any instance of SIMPLE MAX CUT problem opt((/)) > |L'_ff|/2 0. Thus, 

55 

opt(V’) < < 55opt((/i). (3) 

For any solution y of ijj with cost( 2 /) = 28\Eh\ — K, by Lemma |5] we can obtain an embedding y' in which every 
vertex of Q is mapped to only one vertex of Af and has cost at most 28\Eh\ — K. Let the cost of this new embedding be 
cost(y') = 28\Eh\ — K' where K' > K. By Theorem|4]we know that we can obtain a solution x of f from y' of weight at 
least K'. Thus, |cost(x) — opt((^)| < jAT'— opt(0)|. On the other hand |cost(t/) — opt(^)| > \28\Eh\— K + 28\Eh\+ o'pf{(j))\. 
As opt((/)) > K' > K we get, 

|cost(x) - opt((/))| < |cost(y) - opt(V')|. (4) 

Equations (O, (|4]l prove that the reduction presented in Theorem |4] is a linear reduction. Authors in ID showed that for 
SIMPLE MAX CUT no algorithm can achieve an approximation ratio of 0.997 unless P=NR Combining with the linear reduction 
factors of Equations ©, dUi we get the following result. 

Corollary 1. For a given DAG Q and network graph Af finding minimum cost embedding is MAX SNP-hard even when Q has 
bounded out-degree, weights on its edges are bounded, and Af has only three vertices. Moreover, it is hard to approximate 
above a factor of 0.0178 unless P=NP. 

simple greedy algorithm can construct such a cut. 
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V. Algorithm for N with two vertices 

In Theorem |4] (Section lIV-Cl l we proved that finding minimum cost embedding is NP-hard even when there are only three 
vertices in M. In this section we present a polynomial time algorithm to find the minimum cost embedding when the network 
graph has only two vertices. By using the algorithm presented in this section and the technique of Theorem |3] we can obtain 
a rate maximizing schedule for an arbitrary computation graph on a two node network graph in polynomial time. 

For all the discussion in this section we assume that the network graph M has two vertices rii , 77,2 connected via an edge 
of weight x{ni,n2). The computation graph is assumed to be an arbitrary DAG Q. There are k sources in Q = (ll,r); out of 
which Ki are mapped to rii and others are mapped to n2- The sink vertex tUp of Q is at node n2- There is a weight function 
{if ( 7 ) It S r} LJ associated with the edges of Q. The problem is to find the embedding of on Af such that the cost of the 
embedding is minimized. Recall that cost of an embedding is defined by Equation (|2]i. 

To find the minimum cost embedding we first reduce our problem to an instance of 2-Cut which is defined as follows: 
Given a directed graph J = {Vj,Ej) with weights on edges {g{i, j)\{i, j) S Ej} and two distinct vertices ji,j 2 S Vj, find 
two disjoint subsets Ji, J 2 C Vj such that ji € Ji,j 2 S J 2 and the following optimal value is achieved. 

opt(2-CMf(ji, j 2 )) := ^ min ((5(Ji) + 5 {J2)). (5) 

For any set A C 1/j, (5(A) is defined as the sum of weights of all the outgoing edges from A. In other words, 

:= X! 9ihj)- (6) 

i&A,jeVj\A 

We show that 2-Cut problem can be solved in polynomial time and then present an algorithm which converts the optimal 
solution of 2-Cut to the corresponding instance of minimum cost embedding of Q on J\f. 

Lemma 6. Given any directed graph J and its two distinct vertices ji, J 2 2-Cut can be solved in polynomial time. 

Proof: Recall that the solution of opt(2-CMf(ji, j2) are two disjoint subsets Ji, J 2 of Vj such that ji S Ji and j 2 € J 2 - 
Equation (|5]l can be written as opt(2-CMf(ji, J 2 )) = min [5(Ji) + min (5(J2)]. For a given Ji we need to compute the 

Jl^Vj J2^Vj\Jl 

right hand side of the above equation in polynomial time. To do so we modify the equation as follows: Let A be a subset of 
Vj such that ji A. Then we rewrite the equation as 

opt(2-CMf(ji,j2)) = min [(5(AU ji) + min (5(C'Uj2)]- (7) 

-4CVj C<ZVj\{A,ji,j2} 

The second term of the right hand side of above equation can be computed in polynomial time by computing the minimum 
cut of j 2 by considering the subsets from Vj \ {A, ji, j 2 }- Thus for a given set A, right hand side of Equation O can be 
computed in polynomial time. Now we show that this is indeed a submodular function and thus the set A which minimizes 
the value can also found in polynomial time. 

A function h on the subsets of a set U is submodular if for any two sets Y, Z C U, h{Y) + h{Z) > h{Y n Z) + h{Y U Z). 
For any two subsets Y, Z ^ Vj it is easy to observe that 6 {Y U Z) < 6{Y) + S{Z) — 6{Y n Z). Hence (5 is a submodular 
function. Let X C Vj \ {A,ji,j 2 } be the set which minimizes the second term of Equation O. Then for a set A, let 
h{A) := (5(A U ji) + S{X U j 2 ). Similarly for a set B h{B) = S{B U ji) + S{Y U j 2 ) where Y CVj\ {B, ji, J 2 } minimizes 
the second term of Equation (|3- Also, h{A U B) = S{AU B U ji) 6{Z U j 2 ) for some Z CVj\ {A U B, ji, j 2 }. Note that 
{X n Y) and (A U B) are disjoint sets which implies that X CY C Vj \ {A U B, ji, j 2 }- Thus, 

/i(AUB) < (5(AUBUji)+ 5(Xnr Uj2)- 

Similarly h{A C B) = 5{A Ci BU ji) + 5(W U J 2 ) for some W QVj\ {A (7 B, ji, j 2 }- Note that {X U Y) and (A n B) are 
disjoint sets. Thus, 

h{AcB) < 5{Ar\ B a ji) + 5{X CY Uj2)- 

As (5 is a submodular function, it is easy to observe that h{ACB) -\- h{AC B) < h{A) -\-h{B). This proves that the right hand 
side of Equation Q is a submodular function and opt(2-CMf(ji,^’ 2 )) can be obtained in polynomial time by using algorithm 
presented in ll24l . ■ 

Given an instance ip = {Q, Sg,ujp,w,JV, S_\f,tjy) of minimum cost embedding we create an instance (p = {J, g, ji, j 2 ) of 
2-Cut. 

Theorem 5. The instance ip of minimum cost embedding problem has the optimal embedding of cost C if an only if the 
corresponding instance (p of 2-Cut has the optimal cut of weight C. 

Proof: We first construct the directed graph J for 2-Cut instance from (y, Af as follows: Replace each vertex of Q, except 
for the sink vertex uip, by the gadget shown in Fig. |5] Add two vertices labeled ji,j 2 in this graph. Add outgoing edges 

'’’Recall that the weight of an edge of Q is associated with the sub-function it carries. Thus all outgoing edges of a vertex of Q have same weight. 
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111 life 



111 life 



(b) 


Fig. 5: (a) A vertex in Q with k incoming edges and h outgoing edges of weight lu (b) Gadget to replace u shown in (a). 
Labels near the edges represent their weights. 


from ji to all the “in” vertices pf the sources which are mapped to ni G Af with weight of oo. Similarly add outgoing edges 
from j 2 to the remaining “in” vertices of the sources and the sink ujp (note that these vertices are mapped to 712 € Af) with 
weight 00 . We label the resulting directed graph by J for the 2 -Cut instance with ji, j 2 being the two vertices for which 
opt(2-CMf(ji, J 2 )) has to be computed. 

Proof of Theorem |5] follows directly from the following two lemmas. 

Lemma 7 . If for the instance tp there is an embedding £ of cost C then there is a 2 -Cut(ji, j2) of weight C for the instance 
(j). 

Proof: Before proving the lemma we recall a few notations and ideas about Q and its embedding on Af. Every vertex u 
of Q computes a specihc function 9 and all its outgoing edges carry the same function. The set of all the successor functions 
of 9 is represented by A4^(0). An embedding of on A/" gives us a mapping of vertices of Q to that of Af. It tells us on which 
vertices of Af the function 9 is computed. The network graph Af for our instance 1/1 has only two vertices ni,n 2 . Thus any 
function is either computed at ni or 712 or both. Also recall that the 2 -Cut{ji, j2) partitions the vertex set Vj of J into three 
disjoint sets Ji, J 2 , J3 such that ji G Ji,j2 G ^2- In all the discussion below we assume that vertex u G ft computes function 
9. We compute the 2 -Cut{ji,j2) from the embedding £ as follows; 

1) Put ji (j 2 ) in Ji (J 2 respectively). 

2) If a source vertex Wj G is mapped to ni ( 712 ) then put wj" in Ji (J 2 respectively). Put the sink vertex ujp in J 2 - 

3) If 9 is computed at both 711,712 under embedding £ then put in J 3 . 

4) If 9 is computed at only one vertex, say tii ( 712 ) then put u™ in Ji (J 2 )- 

5) If all the functions in are computed only at 711 ( 712 ) then put 7i°“* in Ji (J 2 )- 

6) If some of the functions of A^(6*) are computed at ni and some are computed at 712 then put 7i°“* in J 3 . 

It is easy to observe that this cut is a valid 2 -Cut{ji, j2). Now we compute the weight of the cut by computing <)( Ji), 5{J2)- 
First note that none of the 00 weight edges of ji, j 2 are in the cut as corresponding sources and the sink are mapped to Ji, J 2 . 
Similarly, any vertex 7i°“* is mapped in Ji or J 2 if all its successor functions are computed there. Thus, no 00 weight is in 
i5(Ji), 5(J2) and the cut size is hnite. Observe that the way J is constructed from Q corresponding to all the outgoing edges 
of any vertex u there is only one edge G Ej of same weight. This edge is in the cut constructed above iff any of 

the corresponding edges are exposed in £ (points 5, 6). Hence the weight of the cut constructed above is same as that of £. ■ 

Lemma 8. If there is a 2 -Cut{ji, j2) for the instance f of weight C then there is an embedding of Q on Af of cost < C. 

Proof: Recall that a 2 -Cut{ji, j2) partitions the elements of Vj into three sets Ji, J2, J3. We create an embedding from 
the cut as follows: If any vertex u™ G Vj is in Ji (J2) then map u at ni ( 772 ) under embedding £. If it™ is in J3 then map 
u to both 77i and 712 . As the weight of the cut is hnite, we know that all the sources of Q which are connected to ji (or J 2 ) 
the corresponding “in” vertices are in Ji (or J2). This ensures that all the sources are mapped either to rii or 712 under £. 
Similarly, the sink of Q is in J 2 and thus mapped to 712 under £. Observe that all the edges which are in 5{Ji) and 5 {J2) are 
exposed in the embedding £. Hence the cost of this embedding is same as that of the cut C. As the vertices in J 3 are mapped 

at both Til and 712 , there will be some redundant computations in £. For example some vertex u might be computed at both 

nodes but all its successors are computed only at ni , thus making the computation at 772 redundant. To get a valid embedding 
we need to remove such computations and removing (or pruning) such computations will only reduce the cost from C. As 
there are only two nodes in the network checking for redundant computations for each vertex of Q can be done in polynomial 
time and thus gives an embedding £ of cost < C. ■ 

Proof of forward direction (Theorem | 5 ): We need to prove that the minimum cost embedding has optimal embedding of 
cost C if the 2 -Cut has optimal cut of weight C. Let £ be an embedding obtained by applying the procedure on the optimal 
2 -Cut presented in the proof of Lemma [8] with cost C < C. Let C < C. Then by Lemma [T] we can obtain a 2 -Cut of f of 
weight C'. But this is a contradiction to the fact that f has the optimal cut of weight C. Thus the embedding £ obtained from 
the optimal cut of f has cost C = C. 
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Proof of backward direction (Theorem |5): Now we need to prove that if there is an optimal embedding of cost C then (p 
has the optimal cut of weight C. By Lemma [T] we can obtain a 2-Cut for (p of weight C from the optimal embedding of ip. 
This cut has to be the optimal cut else we can get an embedding of lesser cost than C by Lemma [ 8 ] ■ 

VI. Approximate Algorithms 

In Section |IV] we proved that finding a rate maximizing schedule is MAX SNP-hard. In this section we define a restricted 
class of embeddings and present some approximation algorithms for the corresponding maximum rate schedule problem. 

Definition 5 (R-Embedding). A restricted embedding (R-Embedding) of Q on M is a function : L i—>■ E which follows the 
following set of rules. 

1) For some 7 G L i/tail( 7 ) =uji,i S [1,k] then start(f'( 7 )) = Si. 

2) If for some 7 G L, head( 7 ) = utp then end(f'( 7 )) = t. 

3) If-fi G ^';( 7 j) for some 71 , 7 ^ G L then end(r( 7 j)) = start(£:'( 7 j)). 

Note that any intermediate function is computed only once in the network under R-Embedding. R-Embeddings are a special 
case of the embedding (defined in Definition [U and let E' be the set of all the R-Embeddings of Q on Af. 

We can write a packing linear program, similar to CALP (presented in Section HIIll . in which the embeddings are coming 
from the set E' instead of the general set of embeddings E. Let us call this LP as R-CALP. We observe that the separation 
oracle of the dual of R-CALP also reduces to the problem of finding minimum cost R-Embedding problem where the cost of 
the R-Embedding is defined by Equation (|2]l. Hence forth we refer the problem of finding the minimum cost R-Embedding by 
MinCost(C'). It is easy to verify that Theorem [3 also holds in this case giving us the following corollary. 

Corollary 2. There is a polynomial time a-approximation algorithm to solve R-CALP if and only if there is a polynomial time 
a-approximation algorithm for solving MinCost{C) of Q on Af. 

In Section IIV-CI we proved that minimum cost embedding problem is MAX SNP-hard by reducing it from SIMPLE MAX 
CUT problem. Recall that the instance of minimum cost embedding problem which we created has the optimal embedding in 
which one vertex of Q is mapped to only one vertex of M. Thus the reduction presented in Theorem |4] also proves that solving 
the minimum cost R-Embedding problem is MAX SNP-hard. In this section we present some approximation algorithms to 
solve MinCost(C') problem thus giving approximate solutions for R-CALP. 

We first present a version of minimum cost embedding problem which has been studied in literature and relate it to the 
one presented in Section IIV-AI by Theorem | 6 ] Using the result of Theorem | 6 ] and the procedure described in the proof of 
Theorem |3] we give a couple of algorithms to find approximate solutions of R-CALP for special classes of computation graph. 

A. A version of minimum cost embedding 

A version of MinCost(C') has been studied in literature under various names like function computation ll25l . Il27l . optimal 
operator placement |[T], 0, Il 22 l . Il29l and module placement Q, IfTTI . Il 20 l . Il26l . 

The cost model of this literature differs from our cost model (MinCost(C')) in the following two ways — (1) in their cost 
model two outgoing edges of a vertex ui of Q can have different weights and, (2) if an edge e G i? is used by multiple, say 
z, outgoing edges of a vertex a; of ^ in an embedding then while computing the cost of the embedding the weight x(e) is 
considered 2 ; times. In our cost model even if an edge e is used by multiple outgoing edges of a vertex of Q, the weight x(e) 
is taken only once. We define their cost model more formally below. 

Let f}r{e) := l{e G ^^( 7 )} be an indicator function which takes value 1 if an edge e in A/" is used by an edge 7 of (y 
under R-Embedding £'. Then given a vector {x(e)|e G E} and weight function {tu( 7)|7 G L} 0 the cost of an R-Embedding 
is defined as: 

e^E eGE y7er J 

Definition 6 (MinCost(C)). Given a network graph Af with weight function x on its edges, a computation graph Q with weight 
function w on its edges find an R-Embedding opt(C) such that: 

opt(C, (y,A/") := argminC(£’') 

f'eE' 

We omit Q,Af from the above expression when it is clear from the context and use opt(C) to represent the optimal embedding 
for MinCost(C). Observe that opt(C) has the following properties: (1) A vertex of Q is mapped to only one vertex of Af. This 
property is imposed because of the definition of R-Embedding. (2) Every edge 7 of (y is mapped to the shortest path between 
its mapped end points in Af due to the nature of the cost defined in Equation ®. 

"Note that the weights in this case are defined on the edges of Q and outgoing edges of a vertex in Q can have different weights. 
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Example |4] below illustrates the difference between the two cost models and shows how our cost model is more natural 
when 5 is a DAG. 

Example 4. We revisit Example\I\here. Recall that for the computation graph of Fig. [TJ?, w{'y) = IV 7 G E. Let x(e) = IVe € E 
for the network shown in Fig. [7]r. Then the cost of the embedding £i (shown in Fig. \If;) according to Equation (|2|l is C(£i) = 6 
while the cost according to Equation (01 is C(fi) = 7. This difference is due to the fact that the cost incurred over link xz 

S r the transmission of function 65 in £i is taken only once in account by Equation (01 while Equation (0) considers it twice 
In practice the function 9^ is transmitted only once over xz in £i and rate computation in Example 0 does consider this. 

Polynomial time algorithms to solve MinCost(C) problem when ^ is a tree are available in various literature, e.g., 0, 
II25I . II29I . Authors in ifTTl gave polynomial time algorithm when Q is fc-tree while lIZTl proves that the MinCost(C) is MAX 
SNP-hard for general Q. A polynomial time algorithm for a layered Q is presented in EtI . MinCost(C) problem is also related 
to two well studied problems like Multiterminal cut and 0-extension problem. We explain the relation with these problems 
below. 

a) Connection to Multiterminal cut problem: MinCost(C) problem, when 7\7 is a complete graph of k terminals with 
weights x{e) = IVe G E, is equivalent to a well known NP-complete problem Multiterminal Cut 0. The Multiterminal Cut 
problem is defined as follows; Given a graph Q = (ll,r) with weights w{'y) on its edges and a set of k of its vertices, divide 
the graph Q into k parts such that there is only one terminal in each part and the sum of the weights of the edges across 
these parts is minimum. In other words. Multiterminal Cut problem asks for a R-Embedding f of ty on a complete graph 
AC = {V, E) with \V\ = k and x{e) = IVe G E such that cost C(f) is minimum. Refer to IZTll for the details of this reduction 
which proves that MinCost(C) problem is MAX SNP-hard even if the number of terminals k and the weights on the edges 
are constant. 

b) Connection to 0-extension problem: When the network graph AC is a complete graph with k vertices but with arbitrary 
edge weights then the problem 0 -extension can be seen as a special case of MinCost(C) problem. 0 -extension problem was 
first introduced by ini and is defined as follows: Given a graph Q — (£1, E) with non negative edge weights 10 ( 7 ) on its edges 
and a metric d defined on a subset T C fl, find an assignment £ of every w € fl on £{uj) G T such that £{u}) = wVw G T 
and the cost waler : ^ 2 )d(£(uji), £{ 0 : 2 )) is minimum. In other words, 0-extension problem asks for a R-Embedding 

£■ of (y on a complete graph AC = {V,E) with |IC| = |T| and {a:(e)|e G E} where x(e) imposes a metric on V such that 
the cost C(f) is minimum. The 0-extension problem is a well studied problem and we refer the readers to m for a detailed 
review of the results available in the literature. Authors in M proved that for every e > 0 , there is no polynomial time 
(9((logp)^C"‘“'^)- approximate algorithm for 0-extension unless NP C DTIME(p^’°*^(*°®P^) where p is the number of vertices 
in Q with the maximum degree of any vertex and the weight of an edges as poly{\ogp). This result also holds for MinCost(C) 
problem as 0 -extension is a special case of it. 

Next we prove a relation between the MinCost(C) and MinCost((7) problems. 

Theorem 6. Given a network graph J\f with weight function x on its edges and a computation graph Q with weight function 
w on its edges the optimal solution of MinCostiZ) problem gives a D-approximation of MinCost{C) problem where D is the 
maximum out-degree of any vertex in Q. 

Proof: Recall that the cost of a R-Embedding of ty on AC is computed by Equations (0, (0 in MinCost(C) (denoted by 
C(f )) and MinCost(C') (denoted by C(£)) problem, respectively. Let us consider a computation graph Q in which outgoing 
edges of any vertex are not more that D. As seen earlier weight of an edge e in AC considered multiple times if it is used 
by multiple outgoing edges of a vertex of Q in an embedding £ while computing C(f) but it is considered only once for 
computation of C{£). Thus, for any embedding £, C(£) < C(f). By the same argument if the maximum number of outgoing 
edges of any vertex of Cy is D then an edge e of AC can be used at most D times by outgoing edges of any vertex. Thus the 
cost coming from mapping of outgoing edges of a vertex of Q on any edge e of AC in C(£’) could be at most D times the cost 
coming from e in C(£) which implies that C(f) < DC{£). Combining both the arguments we have, 

C(£:) < DC{£) < DC{£). (9) 

Let £i and £2 be the optimal solutions of MinCost(C) and MinCost((7) problem respectively. Then, C{£ 2 ) < C{£i) < 
C(fi) < ^{£ 2 ) fC DC{£ 2 ), where first and fourth inequalities are due to the definitions of £±,£2 and second and third 
inequalities are due to Equation 0 Thus, 


C(£2) < C{£i) < DC{£2). 

This proves the theorem. ■ 

This implies that an algorithm which gives an a-approximate solution for MinCost(C) problem also gives an aD-approximate 
solution for MinCost((7) problem. Recall that by Theorem 0 there is an a-approximation algorithm for solving R-CALP if 


^^Because of the two outgoing edges of node cjs in Q 


Number of 
layers = r 



Fig. 6: A layered computation graph 


and only if there is an a-approximation algorithm for MinCost(C') problem. Combining this fact with the hardness result for 
0-extension in ifT^ we get the following result. 

Corollary 3. Given an arbitrary network graph Af and a computation graph Q with p vertices and the maximum degree of 
a vertex and the maximum weight on an edge in Q is poly (log p), for any e > 0, there is no polynomial time approximation 
algorithm with approximation ratio of 0(poly(\ogp)(\ogpy^‘^~'^) for solving R-CALP unless NP C DTIME(pP°''yl^°sp)f 

Now we present polynomial time approximate algorithms for special classes of computation graph Q. 


B. When Q is a layered graph 

In this section we consider the case when is a layered graph. An example of layered graph is shown in Fig. |6] We assume 
that there are r layers and each layer has at most W vertices. We number layers from {1,... ,r} and vertices of a layer I 
by {wi;, ... ,ujwi}- An edge {ujai,ujbj} is present only if j = i + 1. We also assume that the sink vertex is present on the 
r-th layer. Note that this implies that the out-degree of any vertex in a layered graph is at most W. Commonly used layered 
computation graphs are butterfly structure of fast Fourier transform (FFT), correlation function and functions of Boolean data 
in Sum of Product (or Product of Sum) form. 

A polynomial time algorithm is presented in lIZTl which solves MinCost(C) problem for a layered Q and an arbitrary Af. 
This algorithm takes 0(rnf^) time where n is the number of vertices in Af. Theorem |6] implies that this algorithm is a 
2kF-approximation algorithm for MinCost(C') problem. Recall that MinCost(C') problem is the separation oracle for the dual 
of R-CALP and by the method described in Section BV-CI we can solve the R-CALP by using MinCost(C') solution. This 
leads us to the following result. 

Corollary 4. Given an arbitrary network graph Af with non-negative capacities on its edges and a layered computation graph 
Q with r layers and at most W vertices at each layer, there is a polynomial time W-approximation algorithm to solve R-CALP. 

The complexity of the algorithm of Corollary |4] is exponential in the width of any layer thus the algorithm cannot be applied 
to layered graphs with unbounded width. We now present a procedure to get an 0(F)-approximation of MinCost(C) problem 
for a computation graph Q which has a spanning tree T such that any edge of T is a part of at most 0(F) fundamental cycles. 
A fundamental cycle is a cycle created by adding an edge from Q to T. For every edge uv there is a unique such cycle 
created by the edges of T and uv. 

Theorem 7. Given an arbitrary network Jf and a computation graph Q with a spanning tree 7” such that any edge of F is 
a part of at most 0(F) fundamental cycles, there is a polynomial time 0(F)-approximation algorithm to solve MinCost(£i) 
problem. 


Proof: Let T be the spanning tree of Q such that any of its edge is a part of at most 0(F) fundamental cycles. 
Recall that polynomial time algorithms to find optimal solution for MinCost(C) when the computation graph is a tree are 
known in the literature Q, ll29ll . Using any of the algorithms available in Q, ll29ll we can find the optimal solution of 
MinCost(C) for T on Af. Let this optimal R-Embedding for T be opt(T) with cost C(T). Note that the R-Embedding opt(T) 
gives a mapping for each vertex of Q on Jf. We create an R-Embedding X for Q from opt(T) as follows: Map an edge 
(u,v) G G to the shortest path between its mapped end points in opt(T). In this way the edges of G which are in T are 
mapped to the same paths as in opt(T). It is easy to observe that it is a valid R-Embedding for G with cost Z(X). Let 
the optimal solution of MinCost(C) problem for G on Af he opt((/) with cost C(opt(0)). It is easy to observe that the 
mapping of the edges of G which are in T under the R-Embedding opt(C/) gives a valid R-Embedding of T on Jf. Thus, 

C(r) < 

and X we get 


XW S' XXXW XXX / XXXXX^WX XXXW XV ^ , XX^ V. X . . VXXXV^ XV X^.,XX.-WV^X^X,X^ V^X , WXX . X.XXVXLX, 

^■j-((>uv(opt(G)) < - f'(opt(^))- Also, by the definition of opt((/) 

C(opt((y)) < t(X). 
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Fig. 7: (a) FFT structure for 4 sources, (b) A spanning tree of graph shown in (a) 


The cost of X can be written as C(A’) = 

uv ^ T there is a path auv G T. As an edge uv ^ T is mapped to the shortest distance between its mapped end points in X 
we get, 

E ^uv{x) < E ( E ^ 0{F)Z{r), 

uv^T uv^T \ee<TT,v / 

where the last inequality is due to the property of T. Finally we get, C(A’) < C(T)+0(F)C(T) < 0{F)\1{T) < O(F)C(opt( 0 )). 
This proves that the R-Embedding X is an 0(F)-approximation of opt(ty). ■ 

Using this algorithm with the procedure described in Theorem [3] we get the following result. 

Corollary 5. Given an arbitrary network graph Af with non-negative capacities on its edges and a computation graph Q with 
a spanning tree whose any edge is a part of at most 0{F) fundamental cycles, there is a 0{FD)-approximation algorithm to 
solve R-CALP where D is the maximum out-degree of any vertex in Q. 

An example of such a graph is the computation graph for fast Fourier transform (FFT). A FFT graph for k input sources can 
be represented by a layered graph of r = log(K) layers with W = k vertices on each layer. Fig. |7^ shows an FFT computation 
graph for 4 sources and its spanning tree is shown in Fig. [TJ). It is easy to observe that in such a spanning tree of any FFT 
structure any edge is a part of at most 0(log(K)) fundamental cycles. This gives a 0(log(K))-approximation for R-CALP with 
fc-point FFT computation graph. 


C. QIP for MinCost{C) and its LP relaxation 

In this section we present a quadratic integer program to solve MinCost(C) problem and its linear programming relaxation. 
A similar quadratic integer program for MinCost(C) has been presented in ll28l . Then we show how the algorithms of Cl for 
0 -extension can be extended to get approximate algorithms for MinCost(C) which in turn gives an approximate algorithm for 
R-CALP. 

The quadratic integer program for MinCost(C) problem is shown below. It is easy to verify that the objective function is 
same as Equation ® where d{u, v) is the shortest distance between vertices it, v in the network graph. Recall that in an 
R-Embedding a vertex of the computation graph is mapped to only one vertex in the network graph. Thus for each vertex 
a G fl,u € V we define a binary variable Xau, which takes the value one if and only if a is mapped to u in the embedding 
which minimizes the objective function. The embedding constraints ensure that each vertex a is mapped only to one of the 
vertices in V. Likewise the source and sink constraints ensure that the sources and sink of computation graph are mapped to 
the corresponding sources and sink in the network graph. 

Quadratic Integer Program for MinCost(C) ll28ll 

Objective:min w{a,l3)\ ^ Xaud{u,v)xpy\ subject to 

(Q,/3)Gr \u,vGV J 

1) Source constraints 

Xau = I if a = uji and u = Siii G [1 , k] 

2) Sink Constraint 

Xau = 1 if a = LOp and u = t 


El Xau = 1 Va G fl 

uGV 


3) R-Embedding constraints 
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4) Binary constraints 


Xau € { 0 , 1} Va G f2, M e T4 


Note that the objective function of the above QIP is a quadratic function of the binary variables Xau- We relax this QIP 
into a linear program by using the concept of earthmover distance metric which is very similar to the relaxation presented for 
0-extension problem in 0. Recall that the shortest distance d{u, v) forces a metric on the vertex set V of the network graph 
and \V\ = n. Given a metric {V,d) on a set V the earthmover distance extends the metric to the probability distributions over 
V. If any probability distribution d := {oi,..., a„} over V is seen as amount of dirt piled on i G F then the earthmover 
distance between d and a distribution b := {bi,..., bn} is the minimum cost of moving the dirt from configuration d to b. 
The earthmover distance, dEM{a,b), between two distributions can be found by the following flow problem. 


Objective: dEM{a,b) = min ^ d{u,v)fuv subject to: 

u,v^V 

1) I] fuv =au'iu€V 

vev 

2) X] fuv =by'iv €V 
uev 

3 ) fuv> 0 yu,v€V 


In the flow problem above the variable represents the amount of dirt to be moved from u to v while going from 
configuration d to b. 

To get the LP relaxation for the QIP we first replace the binary constraints by 0 > Xau > 1 for each a € € V except 

for the sources and sink. Then we replace the term XauXpv in the objective function by a variable yaupv resulting in the 
following objective function. 

min E I ^ ^ yoLu(3vd(u^v) 

(a,/5)Gr 

Multiplying the R-Embedding constraint by Xj 3 v and x^u appropriately on both sides we get the new constraints for the 
variables y^uiSv as—(1) X! y^u^v = x^y \fa G € V and (a,/3) € F, (2) X! yo^uiSv = Xo,u "iP gVL.u gV and [a.jd) G 

u£V v£V 

r. 

Let Xa ■= {a:ai, • ■ ■ ,Xan} be an n-dimensional vector where an element Xai corresponds to the variable Xai for i € V. 
Along with the R-Embedding constraints Xa for each a G can be seen as a probability distribution over the set of network 
vertices V and the variable yaupv can be seen as the flow variables corresponding to flow problem to solve the earthmover 
distance between the configuration Xa and xp for each (a, /3) G T. Thus, min ^ yaupvd{u, v) = dEM^Xa, xp) and we can 

write the LP relaxation as follows: 


Earthmover based Unear program for MinCost(C) 
Objective:min ^ w{a, l3)dEM{xa, xp) subject to 

(a./3)Gr 

1) Source constraints 


Xau — 1 if Ct — iOi 


and u = Si^i G [1, k] 


2) Sink Constraint 


Xau = 1 if a = Wp and u = t 


3) R-Embedding constraints 


4) Non negativity constraints 


Xau = 1 Va G n 

uGV 


0 < Xau < 1 Va G n, M G L 


Note that we are not writing the flow constraints yaupv corresponding to Xa,xp here but they are considered in computing 
dEM{xa,xp) while solving this LP. 

Let opt(LP) and opt{QIP) be the optimal objective function values of the LP relaxation and QIP for MinCost(C) 
respectively. Observe that any solution of the QIP for MinCost(C) is also a solution of this LP thus, opt(LP) < opt(Q/P). 
If we can And a polynomial time rounding procedure which rounds the solution corresponding to opt(PP) to a QIP solution 
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X such that objective function value sol(x) of x is: sol(x) < aopt{QIP). Then we have an a-approximation solution for the 
MinCost(C) problem. 

Authors in Q gave two randomized rounding algorithms for 0-extension problem where the LP relaxation is based on the 
semi-metric concept. First rounding procedure of Q gives a 0(log(|T|))-approximation for an arbitrary graph Q = (fl,r) 
where T C on which the metric is given. Recall that the 0-extension problem can be seen as a special case of MinCost(C) 
problem with the network graph Af = {V,E) as a complete graph on vertices of T with edges following the given metric 
and the computation graph as Q. The semi-metric LP relaxation allows the mapping of vertices of Q on an arbitrary metric 
containing the given metric. The semi-metric LP relaxation cannot be directly extended to MinCost(C) problem but the rounding 
algorithms of ||7] work for our earthmover based LP relaxation. Thus an instance of MinCost(C) problem in which number of 
vertices in J\f are equal to the number of sources and sink (in other words, there are no intermediate nodes in M and |L| = |T|) 
the first rounding procedure of Q will give an 0(log(|L|))-approximation. In general for any MinCost(C) instance \V\ > \n 
We applied the rounding procedure of Q to a general instance of MinCost(C) and got an 0 (log(|y|))-approximation for 
that as well. Recall that the optimal solution of earthmover LP gives a \V\ = n length vector Xa = {xai, ■ ■ ■ ,Xan} for each 
vertex a G fl. The vector Xa is a probability distribution over V, where an element Xau represents the probability with which 
vertex a of ^ can be mapped to u of Af. Thus each element of it may have fractional value except for the sources and 
sink vectors which have integral values due to the corresponding constraints. Let Xu ■= {0,..., 1, 0,..., 0} be the integral 
probability distribution over V in which the whole mass is concentrated on the vertex u € V. For finding an integral solution 
corresponding to fractional solution obtained by LP, the rounding procedure first finds a subset of V which is closest to Xa 
by finding the earthmover distance dEM{xa,Xu)'^u G V. Then parsing all the vertices of V from a random permutation of 
V it assigns a vertex a to a vertex u of L if it is close to the subset found earlier for a. Carrying out the analysis along 
the lines of 121 we observe that this rounding procedure gives a solution x of QIP such that sol(x) < 0{\og{n))opt{QIP). 
Combining this with the results of Theorems |6] [3] we get the following result. 

Corollary 6. Given an arbitrary network graph Af with non-negative capacities on its edges and a computation graph Q 
in which the out-degree of any vertex is at most D there is a polynomial time 0{D\ogn)-approximation algorithm to solve 
R-CALP, where n is the number of vertices in Af. 

In the second rounding procedure of Q authors exploit the structural properties of the given graph Q and give an 0 ( 1 )- 
approximation when Q is planar. A common example of a planar computation graph is of the correlation function. A correlation 
function over k sources is defined as: / = XiXi+i. Observe that it can be represented as a planar layered graph. The 

second rounding procedure of ||2l can also be applied to our earthmover LP. The analysis for this rounding procedure only 
depends on the structure of the graph Q and not on the number of vertices of Af thus the same analysis also works for our 
case also. This leads to the following result. 

Corollary 7. Given an arbitrary network graph Af with non-negative capacities on its edges and a planar computation graph 
Q in which the out-degree of any vertex is at most D there is a polynomial time 0{D)-approximation algorithm to solve 
R-CALP. 

The approximation algorithms described in this section are summarized in Table |T] 


Computation Graph (Q) 

Approximation Factor 

Result 

Layered graph with constant width (W — 0(1)) 

U(w) 

Corollary 

Graph with a spanning tree in which every edge is a part of 0{F) fundamental cycles 

0(FD) 

Corollary 

Arbitrary graph with D degree of any vertex 

O(Dlogn) 

Corollary 

Planar graph with D degree of any vertex 

0 ( 0 ) 

Corollary [TJ 


TABLE I: Approximation Algorithms of R-CALP for a specific computation graph (Q) and arbitrary network graph (Af) with 
n vertices 


VII. Discussion 

In this work we studied the problem of finding maximum rate schedule to compute a function / on a capacitated network 
Af when the computation schema for / is given by a DAG, G- We proved that solving this problem is MAX SNP-hard in 
general and presented some polynomial time approximate algorithms for a restricted class of schedules. Algorithmic lower 
bounds have been obtained for many known NP-hard problems under the exponential running time assumption for algorithms 
for satisfiability (SAT) problem 1211 . These assumptions are called Exponential Time Hypothesis (ETH) and Strong Exponential 
Time Hypothesis (SETH). SETH and ETH have led to tight lower bounds for several graph problems on bounded treewidth 
graphs (with running time being exponential in treewidth). It will be interesting to investigate the maximum rate problem under 

*^Here dose is defined by a random parameter 5 G [ 1 , 2 ) and a is assigned to u if it is the first vertex in the permutation which is within distance & from 
the subset found earlier for a. 









20 


ETH and SETH. We provided some polynomial time approximate algorithms for minimum cost embedding problem here, but 
we did not investigate the parameterized complexity ii of the problem. Possible parameters for the minimum cost embedding 
problem could be the treewidth of or the number of sources in Q. Einding algorithms which are exponential only in the 
size of the fixed parameter but polynomial in the size of input can enhance the understanding of the minimum cost embedding 
problem and help us design better algorithms for a general class of Q. 
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Appendix A 

Properties of an embedding 

Recall that an embedding maps an edge 7 to a set of paths such that the function carried by it, say 0, is computed by start 
node of the path and is used by the end node of the path to generate the successor function. Thus any edge in Q which starts 
from a source vertex Wi should be mapped to a path in J\f which starts from Si (item 1 of Definition [T]i. Similarly, any incoming 
edge of sink vertex Wp S H should be mapped to paths which end at the sink t gV (item 2 of Definition [T]|. According to a 
computation event in Af any vertex u € V can compute a symbol of a function 9 at time r if the corresponding symbols of 
all its predecessor functions are available at u. Thus, for every edge 7 of (7, the end points of one of the paths to which its 
predecessor edges are mapped should be the same as the start point of a path to which 7 is mapped and vice versa (item 3 of 
Definition [T]i. 
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Fig. [ 8 ] shows some valid path structures to embed an edge 7 € F in M. In the structures shown in Figs. [ 8 l) and c, the 
function 9 is computed only once (by node a) but used at two different nodes to compute the same successor function. Such 
an embedding is shown in Fig. |2}l of Example |2] Similarly, in embedding structure of Fig. [ 8 }l function 9 is computed at two 
nodes and used by two different nodes in M. 

In any valid embedding same symbol of any function 9 should not be carried by an edge in N multiple times or received 
by a node multiple times (item 4,5 of Definition [TJ. Figs. |9l),c,d correspond to the structures in which the function 9 is carried 
multiple times by an edge (edge (c, d) in Figs. |^,c) or received multiple times by a node (node c in Fig. |^). These structures 
will not occur in any valid embedding. 



(a) (b) (c) (d) 

Fig. 8 ; An edge in Q and structures of its valid embedding (a) An edge 7 in (b) £’(7) = { 06 c, ahd] (c) £^(7) = {a 6 , ac} (d) 
£{'y) = {ab, cd} 



Fig. 9; An edge in Q and structures of its invalid embedding (a) An edge 7 in (b) £"(7) = {acde, bcdf} (c) £"(7) = {acd, bed} 
(d) £’(7) = {ac, be} 


Appendix B 

Proof of lemmas of SECTioN nv-CI 


A. Proof of Lemma Q] 

1) Observe that each source vertex of type S*^y in Fig. [3b) has exactly one outgoing edge of weight 4 and ujp has only 
incoming edges. 

2) This directly follows from Figs. |3b) and jlJb). 

3) First observe that the graph shown in Fig. [3b) has no directed cycles. Moreover the gadget of Fig. [IJb) does not add 
any directed cycle as well. This shows that every gadget which replaces an edge {x, y) G Ejj is a DAG. Observe that 
any vertex x G Vh is a part of exactly three such gadgets (each for one of its edges). Thus x has incoming edges from 
6 sources and has outgoing edges to the intermediate vertices inside these gadgets. All the intermediate vertices of a 
gadget finally go to the sink ojp. There are no edges across these gadget thus ensuring that the whole Q is also a DAG. 

4) Every source vertex has exactly one outgoing edge of weight 4 and every intermediate vertex, i.e., axy,bxy,exy,dxy, 
of the gadget has exactly 2 outgoing edges. Every vertex x G Vh is a part of exactly three gadgets thus has exactly 6 
outgoing edges (two from each gadget). 

5) All outgoing edges of any source have weight 4. Every vertex x G Vh in Eig. [3b) has six outgoing edges of weight one 
thus after applying the gadget of Eig. [3b), it has six outgoing edges of weight 6 x 1 + 1 = 7. Similarly the intermediate 
vertices have two outgoing edges of weight 2 x 4 + 1 = 9. Thus every edge has bounded weight and the maximum 
weight of any edge is 9. 


B. Proof of Lemma \2\ 

Let £ be the minimum cost embedding of on of cost C in which one (or more) edge of weight z from the gadget 
Fig. [3b) is exposed. In other words, in embedding £ some is mapped to a vertex in JV to which u is not mapped. We 
modify £ by mapping to the vertex where u is mapped. The modified embedding £' always has cost lesser than the cost 
of £ which contradicts the fact that £ is the minimum cost embedding. We explain one such case in detail below. 

1) Consider the case when £{u) = a,£{ui) = £^( 11 - 2 ) = /3, £^(wi) = 7 and £{u 2 ) = S. In other words, only one of the weight 
z edge is exposed but both the edges of weight li and I 2 are exposed. Let y{a, jd) = yi, y(/3, 7 ) = y 2 and y(/3, i5) = y^. 
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Then the cost of embedding 8 coming from this structure is C = yiz + y 2 h + 2 / 3 ^ 2 - Now consider the embedding £' 
where Ui,U 2 are mapped to a keeping all the other vertices at the same location as £. Note that y{a, 7) < t/i + 2/2 and 
y{a,5) < yi+ys- The cost of £' is C < {yi+y 2 )h + {yi + y 3 )k < 2yi max(2i, ^ 2 )+ 2 / 2 ^ 1 +^ 3^2 < yiz+y 2 h+y 3 h = C. 
Thus we have an embedding £' where none of the weight z edge is exposed and has cost strictly less than that of £. 
The embedding £' and its cost can be computed in the similar manner for other cases of the mappings of various vertices with 
C < C. 

C. Proof of Lemma |5] 

A 3-way multiterminal cut of a graph is the problem of partitioning the vertices into three parts such that each part has 
exactly one terminal and the weight of the multiterminal cut (defined as the sum of the weights of edges across the parts) is 
minimized. 

Recall that the network graph N created in Theorem |4] is a complete graph on three vertices, namely Si,S 2 ,t, with unit 
edge weights. We create an embedding £ of the gadget from a 3-way cut with weight W of Fig. |3!a) as follows: Map the 
vertices which are with Sixy in the cut to Si in the embedding. Similarly map a vertex to S 2 or t if it is with S 2 xy or ujp in 
the cut, respectively. Map the intermediate vertices Ui,... ,U 2 of Fig. |4jb) to wherever u is being mapped by the earlier step. 
It is easy to observe that f is a valid embedding of the gadget. 

Now we show that the cost of £ is W. Recall that the cost of an embedding is defined by Equation (|2|i and an edge of the 
gadget is said to be exposed if its weight is counted in computing the cost of the embedding. In the following arguments we 
show that an edge of Fig. |3b) is exposed in the embedding iff the corresponding edge of Fig. I^^a) is in the cut. 

1) Consider an edge (Sixy,*) of Fig. Oa). If it is in the cut then its end points, i.e., Sixy and *, are in two separate 
partitions. This in turn implies that the vertex * of Fig. [^b) is not mapped to in embedding £ and the edge {S^^y, *) 
is exposed in £. Similarly, if an edge {S 2 xy,*) of Fig. [3a) is in the cut then the corresponding edge {S^xy,*) of 
Fig- 13b) is exposed in £. Note that weights of (Sixy,*) (Fig. |3a)) and {S*^y,*) (Fig. |3b)) for i G {1,2} are same 
thus contributing to the same weight in the cut as well as the cost of £. 

2) Now consider the edges (x, Oxy) and {x, Cxy) of Fig. |3a). If both the edges are in the cut then there are two possibilities: 
either x,axy,Cxy all are in separate partitions or x is in one partition but axy,Cxy are together in different partition. 
Observe the corresponding edges in Fig. |3b). They are replaced by the structure of Fig. |4}b) with a^y, c^y as intermediate 
vertices between x and axy,Cxy respectively. Note that under embedding £, vertices a^y,c^y are mapped wherever x 
is mapped and axy,Cxy are mapped to either different or same vertices (depending on them being in different or same 
partitions in the cut). In either case the edges {a^y,axy) and {c^y,Cxy) are exposed in the embedding if {x,axy) and 
{x,Cxy) of Fig. 13 a) are in the cut thus contributing to the same weight in f’s cost. Same argument holds for all the 
outgoing edges from vertices x, y, Oxy, bxy, Cxy, dxy of Fig. |3b). 

3) Finally note that an edge of Fig. |3b) is exposed only if its end points are mapped to different vertices in £ which in 
turn implies that the corresponding edge of Fig. |3a) is in cut. The weight z edges of Fig. |3b) are never exposed in £ 
as their endpoints are always mapped to same vertex in £. 

This proves that the cost of £ is indeed W which is same as the weight of the 3-way cut. 

D. Proof of Lemma |3 

Recall that for every edge {x, y) G Eh there is a gadget of Fig. |3b) (along with Fig. |3b)) in Q and the network graph 
M has only three vertices. Given an embedding £ with multiple mappings for a vertex we construct the embedding £' with 
single mapping in the following steps. 

1) If any intermediate vertex of Fig. SJb), i.e., is mapped to multiple vertices then in £' map all its copies to 

wherever u is mapped in £ keeping the rest of the vertices at the same place. This will only reduce the cost of the 
resulting embedding. 

2) Observe that the vertices hxy^ Cxy of Fig. |3b) have only one outgoing edge which is going to ojp. As the mapping of ujp 
is fixed to f G F in any valid embedding, the outputs of bxy, Cxy are required only at one vertex in the embedding. Thus, 
the operations performed at these nodes cannot be performed at multiple vertices in the network graph and bxy, Cxy are 
not mapped to multiple vertices in any valid embedding. 

3) Consider the vertex Oxy 0 and let it be mapped to two vertices in V under embedding £. There are three possible 
mappings of Oxy in this case and we show that in each case mapping it to only one of the vertices brings down the cost 
of the embedding. 

a) Let Qxy be mapped to S2 and t under embedding £. Create an embedding £' where Oxy is mapped to only 
t keeping the mapping of all the vertices same as that of £. Then, C{£' ) < C{£) - w{S-^,y,axy)y{Si,S2) + 
w{axy,bxy)y{S2, f) = C{£) - 4 -I-1 < C{£). 

^^axy has outgoing edges to cjp, bxy and both are mapped to only one vertex under a valid embedding. 
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b) Let Qxy be mapped to S'! and t under £. Create the embedding £' where Uxy is mapped only to S'! keeping the 
mapping of all the vertices same as that of £. Then, C{£') = C{£) — w{Sf^y,axy)y{Si,t) + w{axy,uip)y{Si,t) = 
C{£)-4 + 4. 

c) Let Qxy be mapped to Si and S 2 under £. Create the embedding £' where Uxy is mapped only to keeping the 
mapping of all the vertices same as that of £. It is easy to observe that C{£') < C{£) — 3 in this case. 

The vertex dxy can also be mapped only to one vertex by similar arguments. 

4) Now consider the vertex x in the {x,y) gadget. Since x has two outgoing neighbors in this gadget (namely axy,Cxy) 
and each of them can be mapped to only one vertex, x in turn can be mapped to at most two vertices for this gadget. 
We create the embedding £' of reduced cost as follows. 

a) Let X be mapped to S'! and S 2 under £ for this gadget. Then create the embedding £' where x is mapped only 

to Si keeping the mapping of all the vertices same as that of £. Then, C{£') = C{£) — w{Sf^y,x)y{Si, S 2 ) + 
wix, axy)y{Si,S2) = ) - 4 + 1 < ). 

b) Let X be mapped to Si and t under £. Create £' where x is mapped to S'! keeping the mapping of all the vertices 
same as that of £. It is easy to observe that C{£') < C{£) —4 — 4 + 2 < C{£). Similarly if x is mapped to S 2 
and t then get new embedding by mapping it to S 2 ■ 

In this way for any edge {x, y) each vertex of the corresponding gadget can be mapped to only one vertex in £' and 

C{£') < C{£). 

5) Recall that every x € Vh has three edges in H, thus a; is a part of three gadgets. Till now we have made sure that 

individually for each gadget x is mapped to only one vertex of M but it is possible that it is mapped to more than one 
vertex across the gadgets. Let {x, y) and {x, z) be two edges for whose gadgets x is mapped to separate vertices in £. 
Let X be mapped to for (x, y) gadget and to S 2 for (x, z) gadget. Create the embedding £' where x is mapped to 
Si for {x, z) gadget keeping the mapping of all the other vertices same as that of £. Observe that in embedding £ to 
compute X at Si edges {S^xz,^), {S^xyix) and to compute it at S 2 edges (Sf^^,x), {Sf^y,x) are exposed. While in £' 
as X is computed only at Si the edges i^ixy^^) "'ill not be exposed thus reducing the cost of embedding by 

8. At the same time, at most the outgoing edges of x from {x, z) gadget, i.e., (x, axz){y, Cxz), might get exposed. Thus 

C{£') < C{£)-8 + 2. 

In this way we get an embedding £' in which each vertex of Q is mapped to only one vertex of Af and has cost at most 
that of £. 



